Maxwell distribution (maxwell)#
The Maxwell distribution describes the distribution of the magnitude (speed) of a 3‑D vector whose components are independent zero-mean Gaussians. It is central in kinetic theory (Maxwell–Boltzmann molecular speeds) and appears whenever you care about the norm of an isotropic Gaussian vector.
What you’ll learn#
how Maxwell arises as a 3‑D Gaussian norm (and as a
χdistribution)the PDF/CDF in closed form (with the error function
erf)moments, MGF/characteristic function, and entropy
MLE for the scale parameter and how it relates to
scipy.stats.maxwellsampling with NumPy only and diagnostic plots
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition
Moments & properties
Parameter interpretation
Derivations (expectation, variance, likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration
Statistical use cases
Pitfalls
Summary
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from scipy import stats
from scipy.special import erf, erfi
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
Prerequisites & notation#
We write
with a scale parameter \(a>0\).
In physics you often see \(a = \sqrt{k_B T/m}\), where \(T\) is temperature and \(m\) is particle mass.
In SciPy the parameterization is
stats.maxwell(loc=0, scale=a).
A key structural fact (the main source of intuition and derivations):
If \(Z_1,Z_2,Z_3 \overset{\text{iid}}{\sim} \mathcal{N}(0,1)\) and $\(R = \sqrt{Z_1^2 + Z_2^2 + Z_3^2},\)\( then \)R\( has a **chi** distribution with \)k=3$ degrees of freedom.
Scaling by \(a\) gives a Maxwell random variable: \(X = aR\).
A very useful equivalent statement is about the square:
where we use the shape–scale Gamma parameterization.
1) Title & classification#
Item |
Value |
|---|---|
Name |
Maxwell ( |
Type |
Continuous |
Support |
\(x \in [0,\infty)\) |
Parameter space |
\(a>0\) (scale) |
Note: many references call the scale parameter \(\sigma\) instead of \(a\). SciPy’s
scalecorresponds to \(a\) in this notebook.
2) Intuition & motivation#
What it models#
Maxwell is the distribution of speed when a 3‑D velocity vector is modeled as
isotropic (no preferred direction), and
Gaussian in each component.
Concretely, if \(V = (V_x,V_y,V_z)\) with \(V_x,V_y,V_z \overset{\text{iid}}{\sim} \mathcal{N}(0,a^2)\), then the speed
follows \(\mathrm{Maxwell}(a)\). This is exactly the Maxwell–Boltzmann speed distribution used in kinetic theory.
Typical real‑world use cases#
Molecular speeds in an ideal gas (kinetic theory / thermodynamics).
Magnitudes of 3‑D measurement noise, e.g. the norm of a 3‑axis Gaussian sensor error.
Random-walk / diffusion settings where the 3‑D velocity increments are modeled as normal.
Relations to other distributions#
Chi distribution: \(X/a \sim \chi_{k=3}\).
Chi-square: \((X/a)^2 \sim \chi^2_3\).
Gamma: \(X^2 \sim \mathrm{Gamma}(3/2,\ 2a^2)\).
Dimensional analogues:
in 1D, \(|\mathcal{N}(0,a^2)|\) is half-normal;
in 2D, the radius of an isotropic Gaussian is Rayleigh;
in \(k\) dimensions, the radius is chi with \(k\) degrees of freedom.
3) Formal definition#
PDF#
For \(a>0\) and \(x\ge 0\):
and \(f(x\mid a)=0\) for \(x<0\).
A numerically convenient form is obtained with \(z=x/a\):
CDF#
For \(x\ge 0\) (and \(F(x)=0\) for \(x<0\)):
where \(\operatorname{erf}(\cdot)\) is the Gaussian error function.
def _validate_scale(a: float) -> float:
a = float(a)
if not np.isfinite(a) or a <= 0:
raise ValueError("scale a must be a finite number > 0")
return a
def maxwell_pdf(x, a: float):
"""PDF of Maxwell(a) for x in R (returns 0 for x<0)."""
a = _validate_scale(a)
x = np.asarray(x, dtype=float)
pdf = np.zeros_like(x, dtype=float)
mask = x >= 0
z = x[mask] / a
pdf[mask] = np.sqrt(2 / np.pi) * (z**2) * np.exp(-0.5 * z**2) / a
return pdf
def maxwell_logpdf(x, a: float):
"""Log-PDF of Maxwell(a)."""
a = _validate_scale(a)
x = np.asarray(x, dtype=float)
logpdf = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
z2 = (x[mask] / a) ** 2
logpdf[mask] = 0.5 * np.log(2 / np.pi) + 2 * np.log(x[mask]) - 3 * np.log(a) - 0.5 * z2
return logpdf
def maxwell_cdf(x, a: float):
"""CDF of Maxwell(a) for x in R (returns 0 for x<0)."""
a = _validate_scale(a)
x = np.asarray(x, dtype=float)
cdf = np.zeros_like(x, dtype=float)
mask = x >= 0
z = x[mask] / a
cdf[mask] = erf(z / np.sqrt(2)) - np.sqrt(2 / np.pi) * z * np.exp(-0.5 * z**2)
return cdf
# quick consistency check against SciPy
x_test = np.array([-1.0, 0.0, 0.3, 1.2, 3.0])
a_test = 1.7
np.allclose(maxwell_pdf(x_test, a_test), stats.maxwell.pdf(x_test, scale=a_test)) and np.allclose(
maxwell_cdf(x_test, a_test), stats.maxwell.cdf(x_test, scale=a_test)
)
True
4) Moments & properties#
Because Maxwell is a special case of the chi distribution, many quantities have closed forms.
For \(X\sim\mathrm{Maxwell}(a)\):
Quantity |
Value |
|---|---|
Mean |
\(\mathbb{E}[X] = 2a\sqrt{\frac{2}{\pi}}\) |
Second moment |
\(\mathbb{E}[X^2] = 3a^2\) |
Variance |
\(\mathrm{Var}(X) = a^2\left(3-\frac{8}{\pi}\right)\) |
Mode |
\(\sqrt{2}\,a\) |
Skewness |
\(\gamma_1 = \dfrac{2\sqrt{2}(16-5\pi)}{(3\pi-8)^{3/2}}\) |
Excess kurtosis |
\(\gamma_2 = \dfrac{4(-96+40\pi-3\pi^2)}{(3\pi-8)^2}\) (so kurtosis \(=3+\gamma_2\)) |
MGF#
The MGF exists for all real \(t\) and can be written using \(\operatorname{erf}\):
Characteristic function#
Substitute \(t=i\omega\) and use \(\operatorname{erf}(iz)=i\,\operatorname{erfi}(z)\) to get
Entropy#
The differential entropy has a particularly clean form:
where \(\gamma \approx 0.5772\) is the Euler–Mascheroni constant.
EULER_GAMMA = 0.5772156649015328606
def maxwell_mean(a: float) -> float:
a = _validate_scale(a)
return 2 * a * np.sqrt(2 / np.pi)
def maxwell_var(a: float) -> float:
a = _validate_scale(a)
return a * a * (3 - 8 / np.pi)
def maxwell_mode(a: float) -> float:
a = _validate_scale(a)
return np.sqrt(2) * a
def maxwell_skewness() -> float:
pi = np.pi
return 2 * np.sqrt(2) * (16 - 5 * pi) / (3 * pi - 8) ** 1.5
def maxwell_excess_kurtosis() -> float:
pi = np.pi
return 4 * (-96 + 40 * pi - 3 * pi * pi) / (3 * pi - 8) ** 2
def maxwell_entropy(a: float) -> float:
a = _validate_scale(a)
return np.log(a) + 0.5 * np.log(2 * np.pi) + EULER_GAMMA - 0.5
def maxwell_mgf(t, a: float):
a = _validate_scale(a)
t = np.asarray(t, dtype=float)
u = a * t
return np.exp(0.5 * u**2) * (1 + u**2) * (1 + erf(u / np.sqrt(2))) + u * np.sqrt(2 / np.pi)
def maxwell_cf(omega, a: float):
a = _validate_scale(a)
omega = np.asarray(omega, dtype=float)
u = a * omega
return (
np.exp(-0.5 * u**2) * (1 - u**2) * (1 + 1j * erfi(u / np.sqrt(2)))
+ 1j * u * np.sqrt(2 / np.pi)
)
# Compare closed-form moments/entropy to SciPy for scale=1
m, v, s, k_excess = stats.maxwell.stats(moments="mvsk")
print('mean (ours, scipy):', maxwell_mean(1.0), m)
print('var (ours, scipy):', maxwell_var(1.0), v)
print('skew (ours, scipy):', maxwell_skewness(), s)
print('ex-kurtosis (ours, scipy):', maxwell_excess_kurtosis(), k_excess)
print('entropy (ours, scipy):', maxwell_entropy(1.0), stats.maxwell.entropy())
# Monte Carlo sanity check for the MGF at a few t values
x_mc = np.linalg.norm(rng.normal(0, 1.7, size=(200_000, 3)), axis=1)
for t in [-0.6, -0.2, 0.0, 0.3, 0.8]:
mc = np.mean(np.exp(t * x_mc))
closed = float(maxwell_mgf(t, 1.7))
print(f"t={t:+.1f} MC={mc:.6f} closed-form={closed:.6f} rel.err={(mc-closed)/closed:+.3e}")
mean (ours, scipy): 1.5957691216057308 1.5957691216057308
var (ours, scipy): 0.45352091052967447 0.45352091052967447
skew (ours, scipy): 0.4856928280495921 0.4856928280495921
ex-kurtosis (ours, scipy): 0.10816384281628826 0.10816384281629526
entropy (ours, scipy): 0.9961541981062054 0.9961541981062054
t=-0.6 MC=0.242522 closed-form=0.242496 rel.err=+1.084e-04
t=-0.2 MC=0.596076 closed-form=0.596124 rel.err=-8.092e-05
t=+0.0 MC=1.000000 closed-form=1.000000 rel.err=+0.000e+00
t=+0.3 MC=2.403010 closed-form=2.401649 rel.err=+5.664e-04
t=+0.8 MC=14.274996 closed-form=14.205933 rel.err=+4.861e-03
5) Parameter interpretation#
The Maxwell distribution has a single scale parameter \(a\).
Bigger \(a\) shifts the distribution to the right and increases spread.
\(a\) has the same physical units as \(x\).
Scaling rule: if \(Y\sim\mathrm{Maxwell}(1)\), then \(X=aY\sim\mathrm{Maxwell}(a)\). As a result:
\(\mathbb{E}[X]\) and the mode scale linearly with \(a\);
\(\mathrm{Var}(X)\) scales like \(a^2\);
skewness and kurtosis are independent of \(a\).
In the Maxwell–Boltzmann interpretation, \(a\) sets the “typical speed” through temperature and mass:
a_values = [0.5, 1.0, 2.0]
x_max = stats.maxwell.ppf(0.999, scale=max(a_values))
x_grid = np.linspace(0, x_max, 600)
fig = go.Figure()
for a in a_values:
fig.add_trace(
go.Scatter(
x=x_grid,
y=maxwell_pdf(x_grid, a),
mode="lines",
name=f"a={a}",
)
)
# annotate mean/mode for a=1
a0 = 1.0
fig.add_vline(x=maxwell_mode(a0), line_dash="dash", line_color="gray", annotation_text="mode (a=1)")
fig.add_vline(x=maxwell_mean(a0), line_dash="dot", line_color="gray", annotation_text="mean (a=1)")
fig.update_layout(
title="Maxwell PDF for different scale parameters",
xaxis_title="x",
yaxis_title="density f(x | a)",
legend_title="scale",
)
fig.show()
6) Derivations#
Expectation and variance via a Gamma connection#
Let \(X\sim\mathrm{Maxwell}(a)\) and define \(W=X^2\). From the Gaussian-norm construction,
A chi-square with 3 degrees of freedom is a Gamma random variable:
Therefore
From Gamma moments, \(\mathbb{E}[W]=\alpha\theta = 3a^2\), giving \(\mathbb{E}[X^2]=3a^2\).
To get \(\mathbb{E}[X]\), use the chi moment formula (or integrate directly):
Finally,
Likelihood and the MLE#
Given i.i.d. data \(x_1,\dots,x_n\) from \(\mathrm{Maxwell}(a)\) (with \(x_i\ge 0\)), the log-likelihood is
Differentiate and set to zero:
def maxwell_loglik(a: float, x) -> float:
x = np.asarray(x, dtype=float)
if x.size == 0:
raise ValueError("need at least one observation")
if np.any(x < 0):
raise ValueError("Maxwell data must be >= 0")
return float(np.sum(maxwell_logpdf(x, a)))
def maxwell_mle(x) -> float:
x = np.asarray(x, dtype=float)
if x.size == 0:
raise ValueError("need at least one observation")
if np.any(x < 0):
raise ValueError("Maxwell data must be >= 0")
return float(np.sqrt(np.mean(x**2) / 3))
# demonstrate the MLE on synthetic data
true_a = 1.4
x = stats.maxwell.rvs(scale=true_a, size=5_000, random_state=0)
a_hat_closed = maxwell_mle(x)
loc_hat_scipy, scale_hat_scipy = stats.maxwell.fit(x, floc=0)
print('true a:', true_a)
print('MLE (closed form):', a_hat_closed)
print('SciPy fit (floc=0): loc, scale =', loc_hat_scipy, scale_hat_scipy)
print('loglik at true a:', maxwell_loglik(true_a, x))
print('loglik at MLE :', maxwell_loglik(a_hat_closed, x))
true a: 1.4
MLE (closed form): 1.3809148077229623
SciPy fit (floc=0): loc, scale = 0 1.380919232732261
loglik at true a: -6571.750760746983
loglik at MLE : -6568.950376805914
7) Sampling & simulation#
Algorithm (NumPy only)#
Use the defining construction: if
then \(X = \sqrt{V_x^2 + V_y^2 + V_z^2}\) is Maxwell.
So sampling is straightforward:
Draw a matrix \(Z\in\mathbb{R}^{n\times 3}\) with i.i.d. standard normal entries.
Multiply by \(a\) to get the correct scale.
Take row-wise Euclidean norms.
This is fast, stable, and avoids any special functions.
def maxwell_rvs_numpy(n: int, a: float, rng: np.random.Generator | None = None):
"""Sample n i.i.d. Maxwell(a) values using NumPy only."""
a = _validate_scale(a)
if rng is None:
rng = np.random.default_rng()
n = int(n)
if n < 0:
raise ValueError("n must be >= 0")
z = rng.standard_normal(size=(n, 3))
return a * np.linalg.norm(z, axis=1)
# quick simulation check
sim_a = 1.25
x_sim = maxwell_rvs_numpy(200_000, sim_a, rng=rng)
print('theory mean/var:', maxwell_mean(sim_a), maxwell_var(sim_a))
print('MC mean/var :', float(np.mean(x_sim)), float(np.var(x_sim)))
theory mean/var: 1.9947114020071635 0.7086264227026163
MC mean/var : 1.9932654141116128 0.7103638546989847
8) Visualization#
We’ll plot
the PDF and CDF for multiple \(a\) values, and
a Monte Carlo histogram with the theoretical PDF overlay.
(Always label axes with units in real applications; here we use abstract units.)
# CDF curves
fig = go.Figure()
for a in [0.75, 1.25, 2.0]:
x_max = stats.maxwell.ppf(0.999, scale=a)
x_grid = np.linspace(0, x_max, 600)
fig.add_trace(go.Scatter(x=x_grid, y=maxwell_cdf(x_grid, a), mode="lines", name=f"a={a}"))
fig.update_layout(title="Maxwell CDF for different scale parameters", xaxis_title="x", yaxis_title="F(x | a)")
fig.show()
# Histogram + PDF overlay
a = 1.25
samples = maxwell_rvs_numpy(50_000, a, rng=rng)
x_grid = np.linspace(0, stats.maxwell.ppf(0.999, scale=a), 600)
hist = px.histogram(samples, nbins=70, histnorm="probability density", opacity=0.7)
hist.add_trace(go.Scatter(x=x_grid, y=maxwell_pdf(x_grid, a), mode="lines", name="theory PDF"))
hist.update_layout(
title=f"Monte Carlo samples vs Maxwell PDF (a={a})",
xaxis_title="x",
yaxis_title="density",
)
hist.show()
9) SciPy integration#
SciPy exposes Maxwell as scipy.stats.maxwell.
Useful methods:
stats.maxwell.pdf(x, loc=0, scale=a)stats.maxwell.cdf(x, loc=0, scale=a)stats.maxwell.rvs(loc=0, scale=a, size=n, random_state=...)stats.maxwell.fit(data, floc=0)to estimatescale(fixingloc=0is usually appropriate for true speeds)
a = 1.6
x_grid = np.linspace(0, stats.maxwell.ppf(0.999, scale=a), 7)
print('pdf:', stats.maxwell.pdf(x_grid, scale=a))
print('cdf:', stats.maxwell.cdf(x_grid, scale=a))
# sampling
x_scipy = stats.maxwell.rvs(scale=a, size=5, random_state=123)
print('rvs:', x_scipy)
# fitting (fix loc to 0)
data = stats.maxwell.rvs(scale=1.2, size=2_000, random_state=0)
loc_hat, scale_hat = stats.maxwell.fit(data, floc=0)
print('fit loc, scale:', loc_hat, scale_hat)
print('closed-form MLE:', maxwell_mle(data))
pdf: [0. 0.1798 0.3651 0.2655 0.0971 0.0199 0.0024]
cdf: [0. 0.0707 0.3867 0.7456 0.9351 0.9898 0.999 ]
rvs: [1.3253 3.6552 1.8196 4.5334 0.3075]
fit loc, scale: 0 1.1840633901945228
closed-form MLE: 1.1840600406715882
10) Statistical use cases#
Hypothesis testing (goodness-of-fit)#
If you have observed speeds and want to check whether a Maxwell model is reasonable, common diagnostics include:
QQ plots (empirical quantiles vs theoretical quantiles)
KS-style goodness-of-fit tests (with caveats when parameters are estimated)
Bayesian modeling (a simple conjugate trick)#
Because \(X^2\) is Gamma, we can build a convenient Bayesian model for the scale.
Let
Then \(Y_i \mid \beta \sim \mathrm{Gamma}(\alpha=3/2,\ \text{rate}=\beta)\) with
A Gamma prior on \(\beta\) is conjugate:
Generative modeling#
To generate isotropic 3‑D velocity vectors with Maxwell speed distribution:
sample a speed \(S\sim\mathrm{Maxwell}(a)\),
sample a direction \(U\) uniformly on the unit sphere,
output \(V = S\,U\).
(Equivalently: sample \(V_x,V_y,V_z\sim\mathcal{N}(0,a^2)\) and take \(\lVert V\rVert\) for the speed.)
# Hypothesis-testing style diagnostics on synthetic data
true_a = 1.1
x = stats.maxwell.rvs(scale=true_a, size=1_000, random_state=0)
# Fit scale (loc fixed)
loc_hat, a_hat = stats.maxwell.fit(x, floc=0)
# KS test (caveat: parameters are estimated from the same data)
ks = stats.kstest(x, lambda t: stats.maxwell.cdf(t, loc=0, scale=a_hat))
print('fitted a:', a_hat)
print('KS statistic, p-value:', ks.statistic, ks.pvalue)
# QQ plot
n = x.size
probs = (np.arange(1, n + 1) - 0.5) / n
x_sorted = np.sort(x)
q_theory = stats.maxwell.ppf(probs, scale=a_hat)
fig = go.Figure()
fig.add_trace(go.Scatter(x=q_theory, y=x_sorted, mode='markers', name='data'))
min_q = float(min(q_theory.min(), x_sorted.min()))
max_q = float(max(q_theory.max(), x_sorted.max()))
fig.add_trace(go.Scatter(x=[min_q, max_q], y=[min_q, max_q], mode='lines', name='45° line'))
fig.update_layout(title='Maxwell QQ plot (fit on data)', xaxis_title='theoretical quantiles', yaxis_title='sample quantiles')
fig.show()
fitted a: 1.08079599885578
KS statistic, p-value: 0.023420813354591785 0.634194125013562
# Bayesian update for beta = 1/a^2 using Y = X^2/2
x = stats.maxwell.rvs(scale=1.3, size=300, random_state=1)
y = x**2 / 2
alpha = 1.5
# prior beta ~ Gamma(alpha0, rate=b0)
alpha0 = 2.0
b0 = 1.0
alpha_post = alpha0 + alpha * y.size
b_post = b0 + float(np.sum(y))
# sample from posterior of beta (NumPy uses shape-scale; scale = 1/rate)
beta_samples = rng.gamma(shape=alpha_post, scale=1 / b_post, size=200_000)
a_samples = 1 / np.sqrt(beta_samples)
ci = np.quantile(a_samples, [0.05, 0.5, 0.95])
print('posterior a: 5% / 50% / 95% quantiles:', ci)
fig = px.histogram(a_samples, nbins=80, histnorm='probability density', opacity=0.8)
fig.update_layout(title='Posterior over a (via beta=1/a^2 conjugacy)', xaxis_title='a', yaxis_title='density')
fig.show()
posterior a: 5% / 50% / 95% quantiles: [1.2707 1.3204 1.3733]
# Generative modeling: isotropic 3-D velocities via (speed, direction)
a = 1.0
n = 50_000
# speed from Maxwell
speed = maxwell_rvs_numpy(n, a, rng=rng)
# direction: normalize 3-D standard normals -> uniform on sphere
u = rng.standard_normal(size=(n, 3))
u /= np.linalg.norm(u, axis=1, keepdims=True)
v = speed[:, None] * u
# Check: component marginals look Gaussian with variance a^2
vx = v[:, 0]
print('empirical mean(vx), var(vx):', float(np.mean(vx)), float(np.var(vx)))
x_grid = np.linspace(-4 * a, 4 * a, 400)
fig = px.histogram(vx, nbins=80, histnorm='probability density', opacity=0.7, title='One velocity component (should be ~ N(0, a^2))')
fig.add_trace(go.Scatter(x=x_grid, y=stats.norm.pdf(x_grid, loc=0, scale=a), mode='lines', name='N(0,a^2)'))
fig.update_layout(xaxis_title='v_x', yaxis_title='density')
fig.show()
empirical mean(vx), var(vx): 0.0023012231114213485 1.005287111232994
11) Pitfalls#
Parameterization confusion: many texts use \(\sigma\) instead of \(a\), or rescale by \(\sqrt{2}\); check the exponent term \(\exp(-x^2/(2a^2))\) to confirm.
locin SciPy:stats.maxwellincludes alocshift. For genuine speeds, you almost always wantloc=0and should fit withfloc=0.Non-negativity: Maxwell is supported on \([0,\infty)\). If measurement noise produces negative values, you need a measurement model (not just a Maxwell fit).
Model mismatch: Maxwell assumes isotropic Gaussian components. Drift, anisotropy, heavy tails, or mixtures can break this.
Numerics: direct PDFs can underflow for very large \(x/a\); prefer
logpdffor likelihood work. For extreme tail probabilities, use SciPy’s survival function (sf) rather than1-cdf.
12) Summary#
Maxwell is a continuous distribution on \([0,\infty)\) that models the norm of a 3‑D isotropic Gaussian.
It is a special case of the chi distribution (\(k=3\)), and \(X^2\) is Gamma, which makes many results easy.
Key facts: \(\mathbb{E}[X]=2a\sqrt{2/\pi}\), \(\mathrm{Var}(X)=a^2(3-8/\pi)\), and the MLE is $\(\hat a = \sqrt{\tfrac{1}{3n}\sum x_i^2}.\)$
Sampling is simple with NumPy only: draw three independent normals and take the Euclidean norm.
SciPy’s
stats.maxwellprovidespdf/cdf/rvs/fit; remember to fixloc=0for speed data.
References
scipy.stats.maxwelldocumentationMaxwell–Boltzmann distribution (kinetic theory)
Chi and chi-square distributions (Gaussian norms)